Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-25877][k8s] Move all feature logic to feature classes. #23220

Closed
wants to merge 2 commits into from

Conversation

vanzin
Copy link
Contributor

@vanzin vanzin commented Dec 5, 2018

This change makes the driver and executor builders a lot simpler
by encapsulating almost all feature logic into the respective
feature classes. The only logic that remains is the creation of
the initial pod, which needs to happen before anything else so
is better to be left in the builder class.

Most feature classes already behave fine when the config has nothing
they should handle, but a few minor tweaks had to be added. Unit
tests were also updated or added to account for these.

The builder suites were simplified a lot and just test the remaining
pod-related code in the builders themselves.

This change makes the driver and executor builders a lot simpler
by encapsulating almost all feature logic into the respective
feature classes. The only logic that remains is the creation of
the initial pod, which needs to happen before anything else so
is better to be left in the builder class.

Most feature classes already behave fine when the config has nothing
they should handle, but a few minor tweaks had to be added. Unit
tests were also updated or added to account for these.

The builder suites were simplified a lot and just test the remaining
pod-related code in the builders themselves.
@SparkQA
Copy link

SparkQA commented Dec 5, 2018

@SparkQA
Copy link

SparkQA commented Dec 5, 2018

Test build #99687 has finished for PR 23220 at commit a13bafd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 5, 2018

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/5739/

@SparkQA
Copy link

SparkQA commented Dec 7, 2018

Test build #99800 has finished for PR 23220 at commit e217e56.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class ArrowCollectSerializer(Serializer):

@SparkQA
Copy link

SparkQA commented Dec 7, 2018

@SparkQA
Copy link

SparkQA commented Dec 7, 2018

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/5837/

@vanzin
Copy link
Contributor Author

vanzin commented Dec 11, 2018

So, anybody interested in reviewing this?

Copy link
Contributor

@mccheah mccheah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main code looks a lot better, thanks! I have a concern about the tests.

val spec = pod.pod.getSpec
assert(!spec.getContainers.asScala.exists(_.getName == "executor-container"))
assert(spec.getDnsPolicy === "dns-policy")
assert(spec.getHostAliases.asScala.exists(_.getHostnames.asScala.exists(_ == "hostname")))
Copy link
Contributor

@mccheah mccheah Dec 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The second call to exists can be contains instead, so that we don't pass a function object that ignores the argument. Alternatively, both exists calls can be removed:

spec.getHostAliases.asScala.flatMap(_.getHostnames).contains("hostname")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not modifying this code, just moving it from its previous location.

assert(metadata.getNamespace === "namespace")
assert(metadata.getOwnerReferences.asScala.exists(_.getName == "owner-reference"))
val spec = pod.pod.getSpec
assert(!spec.getContainers.asScala.exists(_.getName == "executor-container"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Use .asScala.map(_.getName).contains("executor-container"). Similar changes come up throughout this test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not modifying this code, just moving it from its previous location.

kubernetesClient
}

private def verifyPod(pod: SparkPod): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of things this test checks is remarkable, and it is very much possible to accidentally omit checking the application of a specific feature when a new one is added for either the driver or executor. This is why we had the overridable feature steps in the original incarnation of these tests. Not mocking the substeps leads us to need to check that some specific aspect of each step has been applied. Can we go back to mocking the different steps so that this test can be more easily modified when we add more features? Or else can we abstract away the idea that these steps are applied without this test itself needing to know what the step itself actually does?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another factor of my concern about is that for each individual assertion, it is unclear which step the assertion is tied to. This reads a lot more like an ETE test than a unit test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually did not write this test. I copy & pasted it with zero modifications from the previous class, and I'd prefer to keep it that way.

That's also an argument for not restoring the mocks, which would go against what this change is doing. This test should account for modifications made by other steps, since if they modify something unexpected, that can change the semantics of the feature (pod template support).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's also an argument for not restoring the mocks, which would go against what this change is doing. This test should account for modifications made by other steps, since if they modify something unexpected, that can change the semantics of the feature (pod template support).

Wouldn't most of those unexpected changes come from the unit tests of the individual steps? Granted this test can catch when a change in one step impacts behavior in another step, which is important. Given that this isn't changing prior code I'm fine with leaving this as-is and addressing again later if it becomes a problem.

@mccheah
Copy link
Contributor

mccheah commented Dec 11, 2018

+1 from me, would like @liyinan926 to take a second look

Copy link
Contributor

@liyinan926 liyinan926 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mccheah
Copy link
Contributor

mccheah commented Dec 12, 2018

Merging to master

@asfgit asfgit closed this in a63e7b2 Dec 12, 2018
@vanzin vanzin deleted the SPARK-25877 branch December 12, 2018 20:37
holdenk pushed a commit to holdenk/spark that referenced this pull request Jan 5, 2019
This change makes the driver and executor builders a lot simpler
by encapsulating almost all feature logic into the respective
feature classes. The only logic that remains is the creation of
the initial pod, which needs to happen before anything else so
is better to be left in the builder class.

Most feature classes already behave fine when the config has nothing
they should handle, but a few minor tweaks had to be added. Unit
tests were also updated or added to account for these.

The builder suites were simplified a lot and just test the remaining
pod-related code in the builders themselves.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes apache#23220 from vanzin/SPARK-25877.
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
This change makes the driver and executor builders a lot simpler
by encapsulating almost all feature logic into the respective
feature classes. The only logic that remains is the creation of
the initial pod, which needs to happen before anything else so
is better to be left in the builder class.

Most feature classes already behave fine when the config has nothing
they should handle, but a few minor tweaks had to be added. Unit
tests were also updated or added to account for these.

The builder suites were simplified a lot and just test the remaining
pod-related code in the builders themselves.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes apache#23220 from vanzin/SPARK-25877.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants